# require = will install and load, library can only load installed
# packages
require(tidyverse)
require(ggplot2) # pretty plots
require(dplyr) # dataframe manipulations and %>%
require(infer) # statistics
require(tidyr) # nice tables that look nice
require(car) # ANOVA, also used by the authors for ANOVA
require(mosaic) # histogram, instead of hist (contains {lattice})
require(formatR) #to format R
require(knitr) # inserting images
require(kableExtra) # modifying kable tables
require(skimr) # for data overview, data visualization
[Include a brief summary of the paper you are reanalyzing data from (e.g., the overall objective of the paper, the types of data collected and how sampling was done, what the main results were) and lay out what you intend to replicate.]
The southern Darwin’s frog (Rhinoderma darwinii) mouth-brooding frog species endemic to Chilean Patagonia. R. darwinii has a fascinating method for reproduction: the female (egg-producer) deposits eggs and the male (sperm-producer) fertilizes them, then guards them until the embryos are visibly wriggling inside. Then the male transfers those eggs to their vocal sac where they will be safe from predators and receive nutrients from the male (similar to marsupial embryonic development in their pouch). Ultimately, the male broods these fertilized eggs, often from multiple clutches with different partners, in their vocal sacs for about 50 days, until fully developed froglets emerge by pushing their way out the male’s mouth! All a while during this gestation period, the male continues to call for mates (Sandmeier, 2016)! There are 3 sex roles in this species: pregnant male (MP), non-pregnant male (M or NPM), and female (H or F), and the main sexually dimorphic characteristic is size, where the egg-producers (F) are slightly larger. In this nearly sexually monomorphic (i.e., the sexes look physically identical) species, mate selection depends on advertising calls, and R. darwinii is one of the few species where all sexes (PM, NPM, and F) use advertising calls to attract the attention of a potential mate (Serrano et al., 2020).
To investigate the factors modulating monomorphic vs. dimorphic sexual signals in the R. darwinii, Serrano et al. (2020) hypothesized species with temporally variable periods of intrasexual competition will have monomorphic instead of dimorphic sexual signals to ensure conspecific recognition. In R. darwinii, they specifically focus on timing/availability of advertising calls for mate attraction used by calling (pregnant and non-pregnant) males and calling females (i.e., vocal phenology) (Serrano et al., 2020).
They investigated this with a population of southern R. darwinii on Chiolé Island in Southern Chile during mating season (October 2015-February 2016). Of the 156 frogs caught in the field (118 adults), they recorded calls from 32 of them (Plot1) In the field, they first recorded individual calls (tracks) then collected population monitoring data (snout-vent length, SVL (mm); weight (g)). They also collected data on the sex and sexual status (MP = pregnant male, H = female, M = non-pregnant male) of each frog caught using body size and morphological characteristics. For the pregnant males, they counted externally the number of larvae in the vocal sacs. For each call recordings, they measured the call repetition rate (CRR, number of calls made in a 5 min period after the first call produced), the sound pressure level (SPL, dB), call duration (CD, seconds), the number of notes per call (NC), the note duration (ND, ms), the dominant frequency of the call (DF, Hz); and the amplitude of each vocalization (root mean square (RMS) amplitude) (Serrano et al., 2020).
The Serrano et al. (2020) used Simple Pearson correlations to explore the association used to explore association between acoustic variables of the calls (CRR, SPL, CD, NC, ND, DF) with physical characteristic variables (SVL, weight) (data not shown in article). They used Spearman correlations to determine the association between acoustic properties of the calls of pregnant males and the number of tadpoles in their vocal sacs (data not shown in article). The authors used an ANOVA to investigate the differences in acoustic variables depended on the relative body size of the sexes (Table 1), then they conducted a discriminant function analysis (DFA) for the note duration, dominant frequency, sound pressure level, and aggregated entropy of 4-notes calls (the most common call structure), then analyzed using two linear discriminant vectors (LD) (Figure 4) (Serrano et al., 2020).
Ultimately, Serrano et al. (2020) found females, who have larger body size (SVL and weight) than both male sexes, had longer note duration and call duration than any males (Table 1, Figure 3), and, even when data was corrected for SVL, no acoustic variable had a significant difference between the sexes, based on their ANOVA (Table 1), suggesting body size explains any variation among the acoustic variables and a monomorphic call structure. They also found a negative relationship between number of tadpoles in the vocal sacs of pregnant males and their call amplitude, indicating the advertising call of pregnant males is attenuated by an increased number of tadpoles AND could be an auditory indicator of tadpoles thus could be used for selecting a mate (data not shown; just very cool! This is something several other students and I studied with the first author of this paper in the field in 2019)! Ultimately, they propose mutual selective pressures of the different sexes might contribute to social recognition and sexual status recognition (i.e., recognition of reporoductive state, active or not) of individuals (Serrano et al., 2020).
I will be replicating the descriptive statistic analysis of the qqPlots, Simple Pearson Correlation, Simple Spearman correlation to explore normality trends in the data, just as the authors did, and ANOVA presented in Table 1 of Serrano et al. (2020). THen I will be replicating the Inferential statistical analysis of acoustic variables note duration, call duration, and dominant frequency (figure 3). Lastly, I will replicate the linear discrimination analysis of 5 acoustic variables (Dominant Frequency, Call Rate, Note Duration, Aggregated entropy, and Sound Pressure Level) (figure 4).
load in dataset:
f <- "https://raw.githubusercontent.com/slcornett/data-analysis-replication/main/data/Serrano_et_al_2020_MPMH.csv"
d <- read_csv(f, col_names = TRUE, show_col_types = FALSE) # show column names, hides dataframe message details
d <- d %>%
mutate(CRR = (Calls5min/5)) # creating calls per minute call repetition rate for ease in future calculations
# Data summary
s <- skim(d) # to make a summary table of skim(d) output
# character data
s %>% dplyr::filter(skim_type == "character") %>%
dplyr::select(skim_variable, n_missing, character.min, character.max, character.empty, character.n_unique) %>%
kable(align = 'l', booktabs = TRUE) %>%
kable_styling(font_size = 11, full_width = TRUE)
| skim_variable | n_missing | character.min | character.max | character.empty | character.n_unique |
|---|---|---|---|---|---|
| Nombre | 0 | 3 | 15 | 0 | 30 |
| Captura | 0 | 3 | 4 | 0 | 31 |
| Sex | 0 | 7 | 18 | 0 | 3 |
| Sexo | 0 | 1 | 2 | 0 | 3 |
| Track | 0 | 4 | 4 | 0 | 31 |
# numeric data modify the specific output
s %>% dplyr::filter(skim_type == "numeric") %>%
dplyr::select(skim_variable, n_missing, numeric.mean, numeric.sd, numeric.p0, numeric.p25, numeric.p50, numeric.p75, numeric.p100, numeric.hist) %>%
kable(align = 'l', booktabs = TRUE) %>% # left aligned data
kable_styling(font_size = 11, full_width = TRUE) # font-size
| skim_variable | n_missing | numeric.mean | numeric.sd | numeric.p0 | numeric.p25 | numeric.p50 | numeric.p75 | numeric.p100 | numeric.hist |
|---|---|---|---|---|---|---|---|---|---|
| Calls5min | 0 | 7.290323e+00 | 2.734880e+00 | 3.000000 | 5.000000 | 8.000000e+00 | 9.000000e+00 | 1.200000e+01 | ▅▅▇▅▅ |
| Temp | 22 | 1.938889e+01 | 3.077923e+00 | 15.400000 | 18.200000 | 1.900000e+01 | 2.030000e+01 | 2.560000e+01 | ▅▇▅▂▂ |
| HR | 22 | 7.883333e+01 | 4.238514e+00 | 71.500000 | 75.100000 | 7.920000e+01 | 8.210000e+01 | 8.420000e+01 | ▂▃▂▂▇ |
| Larvas | 20 | 3.090909e+00 | 1.221028e+00 | 2.000000 | 2.000000 | 3.000000e+00 | 4.000000e+00 | 5.000000e+00 | ▇▃▁▃▃ |
| LHC | 1 | 2.242667e+01 | 9.351132e-01 | 20.600000 | 21.800000 | 2.235000e+01 | 2.307500e+01 | 2.520000e+01 | ▂▇▅▂▁ |
| Peso | 1 | 9.356667e-01 | 1.389042e-01 | 0.680000 | 0.837500 | 9.200000e-01 | 1.045000e+00 | 1.190000e+00 | ▇▆▇▆▇ |
| SPL | 6 | 6.336860e+01 | 4.117350e+00 | 53.418182 | 61.700000 | 6.322143e+01 | 6.631667e+01 | 7.143103e+01 | ▂▃▇▇▃ |
| NC | 0 | 1.474194e+01 | 7.763022e+00 | 4.000000 | 8.000000 | 1.400000e+01 | 1.800000e+01 | 3.300000e+01 | ▆▇▃▃▂ |
| ND | 0 | 1.684006e-01 | 3.099900e-02 | 0.115131 | 0.142706 | 1.624118e-01 | 1.890287e-01 | 2.565455e-01 | ▅▇▃▃▁ |
| CD | 3 | 1.670783e+00 | 2.164759e-01 | 1.311454 | 1.510821 | 1.636583e+00 | 1.838234e+00 | 2.141000e+00 | ▅▇▅▃▃ |
| Agg Entropy | 0 | 2.106072e+00 | 5.376220e-01 | 1.379379 | 1.666791 | 1.942529e+00 | 2.477863e+00 | 3.161019e+00 | ▇▆▃▂▃ |
| Avg Entropy | 0 | 1.970192e+00 | 3.396755e-01 | 1.524621 | 1.723484 | 1.851720e+00 | 2.123599e+00 | 2.721850e+00 | ▇▇▃▃▂ |
| DF | 0 | 3.597235e+03 | 2.901571e+02 | 2995.193548 | 3389.708648 | 3.617600e+03 | 3.793418e+03 | 4.272157e+03 | ▂▆▇▅▂ |
| RMS Amp | 0 | 1.005410e+05 | 7.949927e+04 | 12731.900000 | 53886.150000 | 6.422570e+04 | 1.228883e+05 | 3.560384e+05 | ▇▃▁▁▁ |
| CRR | 0 | 1.458065e+00 | 5.469760e-01 | 0.600000 | 1.000000 | 1.600000e+00 | 1.800000e+00 | 2.400000e+00 | ▅▅▇▅▅ |
detach(package:skimr)
DESCRIPTIONS/IDENTIFIERS:
• Nombre = name given to ID the frog;
• Captura = number captured/image name;
• Track = the call recording track
PHYSICAL MEASUREMENTS:
• sexo = sexual status (MP = pregnant male, H = female, M = non-pregnant male)
• peso = mass (g);
• LHC = longitud hocico a cola (snout-vent length, mm);
• temp = temperature (ºC);
• HR = relative humidity;
• larvas = number of tadpoles/offspring in their parent’s mouth!
ACOUSTIC VARIABLES:
• Calls5min = calls in 5 min interval;
• SPL = Sound Pressure Level (dB);
• NC = number of notes/call;
• ND = note duration (either s or ms);
• CD = call duration (s);
• Agg Entropy = aggregate entropy;
• Avg Entropy = average entropy;
• DF = dominant frequency of call (Hz);
• RMS Amp = Root Mean Square Amplitude
• CRR = Call Repetition Rate (Calls/min, i.e., Calls5min/5)
require(cowplot) # for grid plotting
# not included in article as a graph but i think it's nice to show for reference to the methods
p1 <- ggplot(data = d, # object created
aes(x = Sex, # sex of individuals in the dataset
y = LHC)) + # y = SVL, one value dropped b/c NA
geom_boxplot(na.rm = TRUE, show.legend = FALSE, aes(color = Sex)) +
labs(x = "Sex of Frog at time of Capture", y = "Snout-Vent Length (mm)") +
ggtitle("Body Size of R. Darwinii Frogs Incluced in this Study") +
theme_classic() + # don't need the legend because x is categorical
theme(plot.title = element_text(size = 10, color = "dark green", face = "bold"),
axis.title.x = element_text(size = 8, color = "dark green"),
axis.title.y = element_text(size = 8, color = "dark green"))
# CRR by Sex
p2 <- ggplot(data = d, aes(x = Sex, y = CRR, color = Sex)) +
geom_boxplot(na.rm = TRUE, show.legend = FALSE) + # don't need the legend because x is categorical
labs(x = "Sex", y = "Calls/min") +
ggtitle("Call Repetition Rate by Sex") +
theme_classic() +
theme(plot.title = element_text(size = 10, color = "dark green", face = "bold"),
axis.title.x = element_text(size = 8, color = "dark green"),
axis.title.y = element_text(size = 8, color = "dark green"))
# Notes per Call (NC) vs Sex
p3 <- ggplot(data = d, aes(x = Sex, y = NC, color = Sex)) +
geom_boxplot(na.rm = TRUE, show.legend = FALSE) + # don't need the legend because x is categorical
labs(x = "Sex", y = "Notes per Call") +
ggtitle("Notes per Call by Sex") +
theme_classic() +
theme(plot.title = element_text(size = 10, color = "dark green",face = "bold"),
axis.title.x = element_text(size = 8, color = "dark green"),
axis.title.y = element_text(size = 8, color = "dark green"))
# Call Repetition Rate (CRR) histogram
p4 <- ggplot(data = d, aes(x = CRR, color = Sex)) +
geom_histogram(na.rm = TRUE, show.legend = FALSE) + facet_wrap( ~ Sex) +
labs(x = "Calls per Minute", y = "Frequency of Call Rate") +
ggtitle("Histogram of Call Repetition Rate") +
theme_classic() +
theme(plot.title = element_text(size = 10, color = "dark green", face = "bold"),
axis.title.x = element_text(size = 8, color = "dark green"),
axis.title.y = element_text(size = 8, color = "dark green"))
# pregnant males : number of tadpoles vs call amplitude
p5 <- ggplot(data = d, aes(x = as.factor(Larvas), y = `RMS Amp`, color = Larvas)) +
geom_boxplot(na.rm = TRUE, show.legend = FALSE) + # filtering out all but pregnant males because other sexes don't have tadpoles.
xlab("Number of Tadpoles") +
ylab("Call Amplitude (dB)") +
ggtitle("Pregnant Males: Number of Tadpoles vs Call Amplitude") +
theme_classic() +
theme(plot.title = element_text(size = 10, color = "dark green", face = "bold"),
axis.title.x = element_text(size = 8, color = "dark green"),
axis.title.y = element_text(size = 8, color = "dark green"))
# plotting the graphs together
plot_grid(p1, p2, p3, p4, p5, label_size = 8, nrow = 3)
Be sure to thoroughly explain what replications you are doing and comment your code so that it is easy for a reader to understand.
Include in this section relevant tables/figures/values from the original paper for comparison to what you accomplished with your replication.
Note that I want you to do the bulk of any exposition using text and markdown syntax outside of code blocks. That is, your document should not just be one big code block with R style comments but rather a nicely formatted report with code separated from exposition, interpretation, and discussion.]
# making a dataframe of the relevant variables so don't change the original dataset.
df<- d %>% dplyr::select(Nombre, Sex, Larvas, LHC, Peso, Calls5min, CRR, NC, ND, CD, DF, SPL, `RMS Amp`, `Agg Entropy`, `Avg Entropy`)
# renaming Sex variables in dataframe so will print nicer in the Tukey post-hoc. yes I could just use Sexo but I don't have to change all the code below if i do it like this. I didn't realize the labels were going to be a problem until after I wrote all the code for replication. Also now my visualization labels will match those used in the paper.
df <- df %>%
mutate(Sex = case_when(Sex == "Females" ~ "F",
Sex == "Non-pregnant males" ~ "NPM",
Sex == "Pregnant males" ~ "PM"))
print(df) %>%
kable(align = 'l', booktabs = TRUE) %>% # left aligned data
kable_styling(font_size = 10, full_width = TRUE) # font-size# checking work
## # A tibble: 31 × 15
## Nombre Sex Larvas LHC Peso Calls5min CRR NC ND CD DF SPL
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 "Euge… F NA 23.4 1.15 8 1.6 15 0.257 2.14 3341. 71.4
## 2 "Mich… NPM NA 22.4 NA 4 0.8 17 0.156 1.56 3402. 63.2
## 3 "Dr\x… F NA 23.7 1.1 5 1 12 0.196 2.01 3686. NA
## 4 "Herl… NPM NA 22.7 0.92 5 1 33 0.182 1.72 3488. 62.8
## 5 "Luis… F NA 23.2 1.06 11 2.2 14 0.157 NA 3682. 64.2
## 6 "Fema… PM 5 22.6 1.11 3 0.6 14 0.176 1.74 3634. 62.4
## 7 "Obam… PM 2 22.4 0.93 9 1.8 22 0.144 1.54 3230. 68.3
## 8 "Edga… PM 3 22.2 0.92 8 1.6 17 0.133 1.42 4272. NA
## 9 "Olli… PM 2 22.2 0.86 8 1.6 14 0.209 1.92 3818. NA
## 10 "Rube… PM 2 21.6 0.86 3 0.6 4 0.162 1.64 3273 67.6
## # … with 21 more rows, and 3 more variables: `RMS Amp` <dbl>,
## # `Agg Entropy` <dbl>, `Avg Entropy` <dbl>
| Nombre | Sex | Larvas | LHC | Peso | Calls5min | CRR | NC | ND | CD | DF | SPL | RMS Amp | Agg Entropy | Avg Entropy |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Eugenia | F | NA | 23.4 | 1.15 | 8 | 1.6 | 15 | 0.2565455 | 2.141000 | 3340.876 | 71.43103 | 162764.3 | 1.379379 | 1.524621 |
| Michelle | NPM | NA | 22.4 | NA | 4 | 0.8 | 17 | 0.1556250 | 1.557000 | 3402.231 | 63.22143 | 89155.9 | 2.427146 | 2.031938 |
| Dr<e1>cula | F | NA | 23.7 | 1.10 | 5 | 1.0 | 12 | 0.1964600 | 2.008200 | 3686.474 | NA | 53409.7 | 1.959700 | 1.851720 |
| Herlinda | NPM | NA | 22.7 | 0.92 | 5 | 1.0 | 33 | 0.1815974 | 1.723222 | 3487.814 | 62.77895 | 110800.1 | 2.882636 | 2.690169 |
| Luisa | F | NA | 23.2 | 1.06 | 11 | 2.2 | 14 | 0.1571250 | NA | 3682.167 | 64.16923 | 43954.0 | 1.926542 | 1.766208 |
| Femalesrancisco | PM | 5 | 22.6 | 1.11 | 3 | 0.6 | 14 | 0.1764444 | 1.737125 | 3633.539 | 62.37500 | 62616.0 | 2.544982 | 2.458500 |
| Obama | PM | 2 | 22.4 | 0.93 | 9 | 1.8 | 22 | 0.1435536 | 1.542750 | 3229.946 | 68.26000 | 219744.9 | 1.852411 | 1.757268 |
| Edgar | PM | 3 | 22.2 | 0.92 | 8 | 1.6 | 17 | 0.1327429 | 1.419000 | 4272.157 | NA | 58691.5 | 1.734871 | 1.767586 |
| Ollin | PM | 2 | 22.2 | 0.86 | 8 | 1.6 | 14 | 0.2088393 | 1.920333 | 3817.511 | NA | 105319.3 | 3.143518 | 2.380518 |
| Ruben | PM | 2 | 21.6 | 0.86 | 3 | 0.6 | 4 | 0.1624118 | 1.643667 | 3273.000 | 67.56667 | 29010.9 | 1.593118 | 1.594235 |
| Tehuacan | NPM | NA | 21.8 | 0.83 | 11 | 2.2 | 8 | 0.2013824 | 1.629500 | 3951.965 | 58.16000 | 40606.4 | 1.942529 | 1.919500 |
| Esteban | NPM | NA | 20.6 | 0.78 | 9 | 1.8 | 10 | 0.1418584 | 1.311454 | 3840.141 | 67.59000 | 356038.4 | 2.610080 | 2.721850 |
| Femaleslash | NPM | NA | 21.8 | 0.76 | 11 | 2.2 | 19 | 0.1336744 | 1.627500 | 3377.186 | NA | 64225.7 | 1.739302 | 1.684233 |
| 267 | PM | 5 | 23.4 | 1.09 | 12 | 2.4 | 15 | 0.1678679 | 1.543800 | 4059.596 | 56.92857 | 73951.4 | 3.161019 | 2.587359 |
| Cirilo | PM | 3 | 22.6 | 0.94 | 4 | 0.8 | 5 | 0.1598000 | 1.688200 | 3445.300 | NA | 54693.2 | 1.606550 | 1.788800 |
| Abel | PM | 2 | 21.7 | 1.00 | 7 | 1.4 | 17 | 0.1519701 | 1.549688 | 3617.600 | 69.14000 | 140005.6 | 1.561761 | 1.645627 |
| Izcoatl | PM | 2 | 22.3 | 0.90 | 4 | 0.8 | 7 | 0.1414857 | NA | 3652.040 | 64.54286 | 106579.7 | 1.791514 | 1.915029 |
| Ismael | NPM | NA | NA | 0.74 | 8 | 1.6 | 11 | 0.1312581 | 1.386143 | 3923.200 | NA | 49986.7 | 1.601355 | 1.717290 |
| Victoria | F | NA | 23.6 | 1.11 | 8 | 1.6 | 8 | 0.1626774 | 1.518571 | 2995.194 | 58.87500 | 12731.9 | 2.528581 | 2.046064 |
| Victor | NPM | NA | 23.2 | 0.86 | 5 | 1.0 | 13 | 0.1403438 | 1.403000 | 3962.100 | 62.73750 | 55335.5 | 1.552844 | 1.688219 |
| Melchor | NPM | NA | 21.6 | 0.75 | 5 | 1.0 | 27 | 0.1409467 | 1.487571 | 3929.941 | 64.51000 | 134976.5 | 1.685573 | 1.697280 |
| Femalesrancisco | NPM | NA | 22.7 | 0.80 | 6 | 1.2 | 8 | 0.1151310 | NA | 3769.326 | 61.84667 | 189382.9 | 2.258976 | 1.729679 |
| Xavier | NPM | NA | 21.3 | 0.68 | 10 | 2.0 | 19 | 0.1580547 | 1.478333 | 3649.885 | 59.92400 | 311815.6 | 1.474633 | 1.605016 |
| Ernesto | NPM | NA | 21.8 | 0.78 | 6 | 1.2 | 15 | 0.1791346 | 1.850800 | 3538.088 | 53.41818 | 54669.0 | 1.980154 | 2.021135 |
| 350 | F | NA | 21.9 | 0.90 | 7 | 1.4 | 8 | 0.1615600 | 2.005500 | 3341.940 | 64.53750 | 51199.9 | 2.132840 | 1.987320 |
| 352 | F | NA | 23.2 | 1.13 | 12 | 2.4 | 12 | 0.2103800 | 1.862300 | 3593.478 | 61.70000 | 86866.9 | 2.416900 | 2.276980 |
| Ele | F | NA | 22.1 | 0.92 | 10 | 2.0 | 33 | 0.2107479 | 1.707458 | 3254.204 | 66.65769 | 72817.7 | 2.219202 | 2.009328 |
| Lucrecia | F | NA | 22.7 | 1.00 | 9 | 1.8 | 22 | 0.1979552 | 1.981750 | 3113.609 | 65.52143 | 42238.9 | 3.146298 | 2.457508 |
| Socorro | F | NA | 25.2 | 1.19 | 7 | 1.4 | 7 | 0.1683000 | 1.761000 | 3508.483 | 62.15000 | 54362.6 | 2.919900 | 2.201133 |
| ChiFemales<fa> | PM | 4 | 20.9 | 1.00 | 8 | 1.6 | 27 | 0.2086509 | 1.834045 | 3547.705 | 59.85652 | 166881.2 | 1.648009 | 1.775726 |
| Silvio | PM | 4 | 22.0 | 1.00 | 3 | 0.6 | 4 | 0.1658947 | 1.463000 | 3617.600 | 66.31667 | 61938.0 | 1.865895 | 1.778105 |
# dividing df by sex so easier to access variables. Commented out the ones i don't actually need in the replication.
# np.males <- df %>% dplyr::filter(Sex == "Non-pregnant males") #Non-pregnant males (N = 11)
# head(np.males)
p.males <- df %>% dplyr::filter(Sex == "PM") # Pregnant males (N = 11)
head(p.males)
## # A tibble: 6 × 15
## Nombre Sex Larvas LHC Peso Calls5min CRR NC ND CD DF SPL
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Female… PM 5 22.6 1.11 3 0.6 14 0.176 1.74 3634. 62.4
## 2 Obama PM 2 22.4 0.93 9 1.8 22 0.144 1.54 3230. 68.3
## 3 Edgar PM 3 22.2 0.92 8 1.6 17 0.133 1.42 4272. NA
## 4 Ollin PM 2 22.2 0.86 8 1.6 14 0.209 1.92 3818. NA
## 5 Ruben PM 2 21.6 0.86 3 0.6 4 0.162 1.64 3273 67.6
## 6 267 PM 5 23.4 1.09 12 2.4 15 0.168 1.54 4060. 56.9
## # … with 3 more variables: `RMS Amp` <dbl>, `Agg Entropy` <dbl>,
## # `Avg Entropy` <dbl>
# females <- df %>% dplyr::filter(Sex == "Females") # Females (N = 9, according to the article, N=10, but the dataset only has 9.)
# head(females)
[descriptive statistical analysis]
The authors reported performing the following statistical analyses before performing their ANOVA. As mentioned before, they used Simple Pearson Correlations to determine if the different acoustic variables correlate with either Snout-Vent Length (LHC, mm) or mass (g) (Serrano et al., 2020). They also performed Spearman Correlations to determine an association between acoustic variables and the number of tadpoles in the vocal sacs of pregnant males (Serrano et al., 2020). Below, I conduct these tests. Ultimately, I do not find them as informative as the QQ Plots above.
# SHOULD I TAKE THESE OUT? THEY AREN'T VERY INFORMATIVE.
# Pearson correlations between snout-vent length and acoustic variables
# Snout Vent Length (LHC, mm) vs Note Duration (s)
stats::cor.test(df$LHC, df$ND, method = "pearson") # no correlation
##
## Pearson's product-moment correlation
##
## data: df$LHC and df$ND
## t = 0.9008, df = 28, p-value = 0.3754
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.2048321 0.4979823
## sample estimates:
## cor
## 0.1678215
# Snout Vent Length (LHC, mm) vs Call Duration (s)
stats::cor.test(df$LHC, df$CD, method = "pearson") # no correlation
##
## Pearson's product-moment correlation
##
## data: df$LHC and df$CD
## t = 1.5882, df = 25, p-value = 0.1248
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.08731632 0.61231257
## sample estimates:
## cor
## 0.3027431
# Snout Vent Length (LHC, mm) vs Dominant Frequency (Hz)
stats::cor.test(df$LHC, df$DF, method = "pearson")
##
## Pearson's product-moment correlation
##
## data: df$LHC and df$DF
## t = -0.60019, df = 28, p-value = 0.5532
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.4545167 0.2580443
## sample estimates:
## cor
## -0.1127023
# Snout Vent Length (LHC, mm) vs Aggregate Entropy
stats::cor.test(df$LHC, df$`Agg Entropy`, method = "pearson") # near correlation (p = 0.058)
##
## Pearson's product-moment correlation
##
## data: df$LHC and df$`Agg Entropy`
## t = 1.9699, df = 28, p-value = 0.05881
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.01301586 0.62997452
## sample estimates:
## cor
## 0.3488894
stats::cor.test(log(df$LHC), log(df$`Agg Entropy`), method = "pearson") # log transformation does not help.
##
## Pearson's product-moment correlation
##
## data: log(df$LHC) and log(df$`Agg Entropy`)
## t = 1.875, df = 28, p-value = 0.07127
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.02987812 0.61969111
## sample estimates:
## cor
## 0.3339862
# Snout Vent Length (LHC, mm) vs Avg Entropy
stats::cor.test(df$LHC, df$`Avg Entropy`, method = "pearson") #no correlation
##
## Pearson's product-moment correlation
##
## data: df$LHC and df$`Avg Entropy`
## t = 0.71073, df = 28, p-value = 0.4831
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.2385911 0.4708103
## sample estimates:
## cor
## 0.1331208
# Snout Vent Length (LHC, mm) vs Root Mean Squares of call Amplitude
stats::cor.test(df$LHC, df$`RMS Amp`, method = "pearson") # CORRELATED! (p = 0.01001)
##
## Pearson's product-moment correlation
##
## data: df$LHC and df$`RMS Amp`
## t = -2.7628, df = 28, p-value = 0.01001
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.7054719 -0.1230934
## sample estimates:
## cor
## -0.4628373
# Snout Vent Length (LHC, mm) vs Call Repetition Rate (calls/min)
stats::cor.test(df$LHC, df$CRR, method = "pearson") # super no correlation
##
## Pearson's product-moment correlation
##
## data: df$LHC and df$CRR
## t = 0.076324, df = 28, p-value = 0.9397
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.3476532 0.3727548
## sample estimates:
## cor
## 0.01442243
# Pearson correlations between weight and acoustic variables Weight (g) vs
# Note Duration (s)
stats::cor.test(df$Peso, df$ND, method = "pearson") # CORRELATED! (p = 0.006363)
##
## Pearson's product-moment correlation
##
## data: df$Peso and df$ND
## t = 2.9494, df = 28, p-value = 0.006363
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.1535281 0.7207203
## sample estimates:
## cor
## 0.486868
# Weight (g) vs Call Duration (s)
stats::cor.test(df$Peso, df$CD, method = "pearson") # CORRELATED! (p = 0.0185)
##
## Pearson's product-moment correlation
##
## data: df$Peso and df$CD
## t = 2.5199, df = 25, p-value = 0.0185
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.08448738 0.70883635
## sample estimates:
## cor
## 0.4500518
# Weight (g) vs Dominant Frequency (Hz)
stats::cor.test(df$Peso, df$DF, method = "pearson") # no correlation
##
## Pearson's product-moment correlation
##
## data: df$Peso and df$DF
## t = -1.4735, df = 28, p-value = 0.1518
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.5731400 0.1018496
## sample estimates:
## cor
## -0.268263
# Weight (g) vs Aggregate Entropy of call
stats::cor.test(df$Peso, df$`Agg Entropy`, method = "pearson") # not correlated
##
## Pearson's product-moment correlation
##
## data: df$Peso and df$`Agg Entropy`
## t = 1.812, df = 28, p-value = 0.08073
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.04110161 0.61272059
## sample estimates:
## cor
## 0.3239648
stats::cor.test(log(df$Peso), log(df$`Agg Entropy`), method = "pearson") # log transformation did not help
##
## Pearson's product-moment correlation
##
## data: log(df$Peso) and log(df$`Agg Entropy`)
## t = 1.8005, df = 28, p-value = 0.08257
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.0431561 0.6114335
## sample estimates:
## cor
## 0.3221214
# Weight (g) vs Average Entropy of call
stats::cor.test(df$Peso, df$`Avg Entropy`, method = "pearson") # no correlation
##
## Pearson's product-moment correlation
##
## data: df$Peso and df$`Avg Entropy`
## t = 1.275, df = 28, p-value = 0.2128
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.1376277 0.5482556
## sample estimates:
## cor
## 0.2342567
# Weight (g) vs Root Mean Squares of call Amplitude
stats::cor.test(df$Peso, df$`RMS Amp`, method = "pearson") # no correlation, but very close (p = 0.05853)
##
## Pearson's product-moment correlation
##
## data: df$Peso and df$`RMS Amp`
## t = -1.9723, df = 28, p-value = 0.05853
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.63022222 0.01260513
## sample estimates:
## cor
## -0.3492501
# Weight (g) vs Call Repetition Rate (calls/min)
stats::cor.test(df$Peso, df$CRR, method = "pearson") # no correlation at all
##
## Pearson's product-moment correlation
##
## data: df$Peso and df$CRR
## t = 0.11242, df = 28, p-value = 0.9113
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.3416424 0.3786131
## sample estimates:
## cor
## 0.02124125
# Spearman correlations: number of tadpoles in pregnant males' vs acoustic
# variables Number tadpoles vs note duration (s)
stats::cor(p.males$Larvas, p.males$ND, method = "spearman") # no correlation
## [1] 0.4227058
# Number tadpoles vs call duration (s)
stats::cor(p.males$Larvas, p.males$CD, method = "spearman") # no correlation
## [1] NA
# Number tadpoles vs dominant frequency (Hz)
stats::cor(p.males$Larvas, p.males$DF, method = "spearman") # no correlation
## [1] 0.2960874
# Number tadpoles vs Aggregate Entropy of call
stats::cor(p.males$Larvas, p.males$`Agg Entropy`, method = "spearman") # no correlation
## [1] 0.4419197
# Number tadpoles vs Average Entropy of call
stats::cor(p.males$Larvas, p.males$`Avg Entropy`, method = "spearman") # no correlation
## [1] 0.5620066
# Number tadpoles vs Aggregate Entropy of call
stats::cor(p.males$Larvas, p.males$`RMS Amp`, method = "spearman") # no correlation
## [1] -0.201746
# Number tadpoles vs Call Repetition Rate (calls/min)
stats::cor(p.males$Larvas, p.males$CRR, method = "spearman") # no correlation
## [1] -0.01716697
Serrano et al. (2020) first verified the normality criteria for all the variables in their data using quantile-quantile plots, likely to determine if any of their variables needed to be transformed before analyzing them with ANOVAs. Though they do not show these plots in their paper, I am replicating them below because the authors do not report which variables were transformed nor do they report how they were transformed, other than stating they used the Box Cox transformation.
require(ggpubr) # for qqplots
require(cowplot) # for wrapping the plots
# QQ Plots, will show warnings because there are NAs in the data. this is
# fine.
qp1 <- ggqqplot(df$LHC, title = "Snout-Vent Length (mm) QQ Plot") + font("title",
size = 10, color = "dark green", face = "bold") # normal
qp2 <- ggqqplot(df$Peso, title = "Weight (g) QQ Plot") + font("title", size = 10,
color = "dark green", face = "bold") # normal
qp3 <- ggqqplot(df$Calls5min, title = "Call Rate Interval (Calls/5min) QQ Plot") +
font("title", size = 10, color = "dark green", face = "bold") # normal and looks exactly the same as CRR variable!
qp4 <- ggqqplot(df$CRR, title = "Call Repetition Rate (Calls/min) QQ Plot") +
font("title", size = 10, color = "dark green", face = "bold") # derived this from Calls5min, normal
qp5 <- ggqqplot(df$NC, title = "Notes per Call QQ Polt") + font("title", size = 10,
color = "dark green", face = "bold") # normal
qp6 <- ggqqplot(df$ND, title = "Note Duration (s) QQPlot") + font("title", size = 10,
color = "dark green", face = "bold") # normal
qp7 <- ggqqplot(df$CD, title = "Call Duration (s) QQ Plot") + font("title",
size = 10, color = "dark green", face = "bold") # normal
qp8 <- ggqqplot(df$DF, title = "Dominant Frequency (Hz) QQ Plot") + font("title",
size = 10, color = "dark green", face = "bold") # normal
qp9 <- ggqqplot(df$SPL, title = "Sound Pressure Level (dB) QQ Plot") + font("title",
size = 10, color = "dark green", face = "bold") # normal
qp10 <- ggqqplot(df$`Agg Entropy`, title = "Aggregate Entropy QQ Plot") + font("title",
size = 10, color = "dark green", face = "bold") # not normal
qp11 <- ggqqplot(df$`Avg Entropy`, title = "Average Entropy QQ Plot") + font("title",
size = 10, color = "dark green", face = "bold") # not normal
# plotting qq plots all together
plot_grid(qp1, qp2, qp3, qp4, label_size = 8, nrow = 2)
plot_grid(qp5, qp6, qp7, qp8, label_size = 8, nrow = 2)
plot_grid(qp9, qp10, qp11, label_size = 8, nrow = 1)
# detaching unneeded packages
detach(package:ggpubr)
detach(package:cowplot)
Based on these plots, the two variables that appear to not fit the normality criteria for ANOVA are the two acoustic variables of entropy (Average Entropy and Aggregated Entropy). As the authors describe in their Statistical methods, I will transform these data useing a boxcox() from the {MASS} package to determine how to best transform these variables for ANOVA.
Here, I run Box-Cox transformation for linear models/ANOVAS to determine how to transform variables not fitting normality criterion based on qqplot. The boxcox() function computes and plots the log-likelihood for an object to show the exponent power (\(\lambda\)) the response variable in the object should be raised to transform the object to fit normality criteria for comparison in ANOVA. Based on the QQ plots above, the variables not meeting normality criteria are Avg Entropy and Agg Entropy, so I ran Box-Cox transformations on them to determine how to transform them for ANOVA. For each variable, I plotted the output to visualize what exponential power I should transform the variables, and added the transformed variables into the working dataframe (df).
Below are the arguments used in this function:
• The object argument is the response variables (Avg Entropy and Agg Entropy) against the predictor variables (Sex or Sex+Snout-Vent Length (LHC)) that will be compared in an ANOVA (or in a linear model, but not doing that here). • The lambda argument is the vector values to check as potential exponential powers to raise the variable to in a transformation. By default, the function uses (-2, 2) with 0.1 interval steps. Functionally, this is the x variable in the plot output.
• plotit = is a logical argument, and when it is set to TRUE, it will output a plot of the 95% confidence log-likelihood curve. Setting it to FALSE prints all the the x vakues and y values separately.
• interp = is a logical argument controlling if the spline interpolation is used or not. The spline interpolation fits low-degree polynomials to small subsets of the values, instead of fitting a single, high-degree polynomial to all of the values at once. You should use TRUE if plotting data with a lambda less than 100.
• eps = defaults to 1/50 (0.02); this is the tolerance for lambda to be 0. (I used the default.)
• xlab = defaults to \(\lambda\) (I used the default).
• ylab = defaults to “log-likelihood” (I used the default).
Below, you will see where I conducted these transformations on both the one-way model (Entropy ~ Sex) and the two-way model (Entropy ~ Sex+Snout-vent length); however, after running the ANOVAs on these acoustic variables, I found the authors did not use the comparisons of the acoustic variables to the different sexes corrected for SVL to generate their F-values. For this reason, I commented out those boxcox functions where I calculated how to transform the entropy variables for the two-way model below but you can see below how they would have worked and how I would have added those transformations to the working dataframe (df).
require(MASS) # reported used for normalizing data for ANOVA
## Loading required package: MASS
##
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
##
## select
# boxcox of agg entropy vs sex.
boxcox(df$`Agg Entropy` ~ df$Sex, lambda = seq(-2, 2, 1/10), plotit = TRUE, # "object"
interp = TRUE, eps = 1/50, xlab = expression(lambda), #interp TRUE gives slightly lower lambda (like 0.9 vs 0.8 with FALSE)
ylab = "log-Likelihood")
# transformation for Aggregated Entropy Vs Sex
df <- df %>% mutate(AggE_t1 = (`Agg Entropy`)^(-9/10)) # adding column of transformed AggEntropy
# boxcox of agg entropy vs Sex corrected for snout-vent length
#boxcox(df$`Agg Entropy` ~ df$Sex + df$LHC, lambda = seq(-2, 2, 1/10), plotit = TRUE, # object
# interp = TRUE, eps = 1/50, xlab = expression(lambda), #"interp"
# ylab = "log-Likelihood")
# transformation for Aggregated Entropy Vs Sex + SVL
#df <- df %>% mutate(AggE_t2 = (`Agg Entropy`)^(-7/10)) # adding column of transformed AggEntropy
# boxcox of Avg entropy vs Sex
boxcox(df$`Avg Entropy` ~ df$Sex, lambda = seq(-2, 2, 1/10), plotit = TRUE, # "object"
interp = TRUE, eps = 1/50, xlab = expression(lambda), #interp TRUE gives slightly lower lambda (like 0.9 vs 0.8 with FALSE)
ylab = "log-Likelihood")
# transformation for Aggregated Entropy Vs Sex
df <- df %>% mutate(AvgE_t1 = (`Agg Entropy`)^(-1/10)) # adding column of transformed AvgEntropy
In the methods, the authors reported using {car} package for Anova function (though they do not report the type). To replicate this, I started by working with the two entropy variables I found how to best transform for them to meet normality criteria of ANOVA. I have commented out the work with these variables that produced incorrect F-values, relative to that reported in Table 1. Such work includes running a one-way and two-way Anova model with transformed Agg Entropy against the sexes and against the sexes corrected for snout-vent length. I did the same with Avg Entropy. I also determined what type (I, II, or III) of ANOVA the authors used below, as you can see for the Agg Entropy and Avg Entropy. For these two variables, since the F-values and Sums of Squares of the untransformed Agg Entropy vs Sex for three types of ANOVA are the same regardless of the type, meaning the variables are balanced and factors are orthogonal, I proceeded with the remaining acoustic variables by just running the aov() function. This seemed fruitful as for each variable, I successfully replicated the exact F-value and level of significance reported in Table 1 by the authors.
Below, for each acoustic variable included in Table 1, I do the following: 1. I make an ANOVA model (using the aov() function) of the acoustic variable to see if it varies between the three sexes (female (F), pregnant male (MP), non-pregnant male (NMP). Sex is the predictor variable, and the acoustic variable is the response variable. Moreover, for Table 1, the authors do NOT correct by snout-vent length (i.e., Sex+SVL) as conducting this correction in the model gives a different F-value than what the authors report in table 1.:
aov(data = df, [acoustic variable] ~ Sex) 2. I then print the full summary of the original ANOVA model using the summary.aov()
3. then I pull out the F-value from the model summary, filtering for the variable Sex since I am comparing the acoustic variable between the three sexes 4. I print the omnibus F-value so show that it has been isolated.
# Snout-Vent Length (mm) between different Sexes.
SVL_sex <- aov(data = df, LHC ~ Sex) # SVL = LHC
summary.aov(SVL_sex) # print the model summary
## Df Sum Sq Mean Sq F value Pr(>F)
## Sex 2 8.312 4.156 6.583 0.00469 **
## Residuals 27 17.046 0.631
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 1 observation deleted due to missingness
# pulling out the F-statistic (nonparametric)
SVL_sex.F <- SVL_sex %>% # pull out the F-value
generics::tidy() %>% # a nice table
filter(term == "Sex") # since comparing between sexes
SVL_sex.F$statistic # prints the omnibus F-statistic of the between Sex comparison of SVL alone (full table will also show the p-value). matches the F-value and significance level reported in Table 1.
## [1] 6.582981
# Mass (g) between different Sexes.
mass_sex <- aov(data = df, Peso ~ Sex) # Peso = Mass/weight
summary.aov(mass_sex) # print the model summary
## Df Sum Sq Mean Sq F value Pr(>F)
## Sex 2 0.3655 0.18275 25.43 6.17e-07 ***
## Residuals 27 0.1940 0.00719
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 1 observation deleted due to missingness
# pulling out the F-statistic (nonparametric)
mass_sex.F <- mass_sex %>% # pull out the F-value
generics::tidy() %>% # a nice table
filter(term == "Sex") # since comparing between sexes
mass_sex.F$statistic # prints the omnibus F-statistic of the between Sex comparison of SVL alone (full table will also show the p-value). matches the F-value and significance level reported in Table 1!
## [1] 25.43115
# Note Duration (ND, s) between different Sexes. the article reports this is in ms in Table 1, but, based on the data set, they are in seconds.
ND_sex <- aov(data = df, ND ~ Sex) # model
summary.aov(ND_sex) # print the model summary
## Df Sum Sq Mean Sq F value Pr(>F)
## Sex 2 0.007553 0.003776 4.97 0.0142 *
## Residuals 28 0.021276 0.000760
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# pulling out the F-statistic (nonparametric)
ND_sex.F <- ND_sex %>% # pull out the F-value
generics::tidy() %>% # a nice table
filter(term == "Sex") # since comparing between sexes
ND_sex.F$statistic # prints the omnibus F-statistic of the between Sex comparison of SVL alone (full table will also show the p-value). matches the F-value and significance level reported in Table 1.
## [1] 4.969906
# Call Duration (CD, s) between different Sexes.
CD_sex <- aov(data = df, CD ~ Sex) # model
summary.aov(CD_sex) # print the model summary
## Df Sum Sq Mean Sq F value Pr(>F)
## Sex 2 0.4983 0.24917 8.122 0.00191 **
## Residuals 25 0.7669 0.03068
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 3 observations deleted due to missingness
# pulling out the F-statistic (nonparametric)
CD_sex.F <- CD_sex %>% # pull out the F-value
generics::tidy() %>% # a nice table
filter(term == "Sex") # since comparing between sexes
CD_sex.F$statistic # prints the omnibus F-statistic of the between Sex comparison of SVL alone (full table will also show the p-value). matches the F-value and significance level reported in Table 1.
## [1] 8.122413
# Dominant Frequency (DF, Hz) between different Sexes.
DF_sex <- aov(data = df, DF ~ Sex) # model
summary.aov(DF_sex) # print the model summary
## Df Sum Sq Mean Sq F value Pr(>F)
## Sex 2 561049 280524 3.998 0.0297 *
## Residuals 28 1964686 70167
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# pulling out the F-statistic (nonparametric)
DF_sex.F <- DF_sex %>% # pull out the F-value
generics::tidy() %>% # a nice table
filter(term == "Sex") # since comparing between sexes
DF_sex.F$statistic # prints the omnibus F-statistic of the between Sex comparison of SVL alone (full table will also show the p-value). matches the F-value and significance level reported in Table 1.
## [1] 3.997933
# Aggregated Entropy between different Sexes.
AggE_sex <- aov(data = df, `Agg Entropy` ~ Sex) # untransformed, also pull out F-value
summary.aov(AggE_sex) # print the model summary
## Df Sum Sq Mean Sq F value Pr(>F)
## Sex 2 0.445 0.2223 0.757 0.479
## Residuals 28 8.226 0.2938
# pulling out the F-statistic (nonparametric)
AggE_sex.F <- AggE_sex %>%
generics::tidy() %>%
filter(term == "Sex") # since comparing between sexes
AggE_sex.F$statistic # prints the omnibus F-statistic of the between Sex comparison of Agg Entropy alone (full table will also show the p-value)
## [1] 0.7566687
# to show type 2 and 3 print the same F-value
# type 2 ANOVA
AggE_sex.a2 <- Anova(AggE_sex, type = "II")
# print the results
AggE_sex.a2 # the F value here IS EXACTLY WHAT THEY GOT IN TABLE 1????? so apparently, despite this variable not fulfilling normality criteria for linear models. This is very confusing to me. # type 3 ANOVA
## Anova Table (Type II tests)
##
## Response: Agg Entropy
## Sum Sq Df F value Pr(>F)
## Sex 0.4446 2 0.7567 0.4786
## Residuals 8.2265 28
AggE_sex.a3 <- Anova(AggE_sex, type = "III")
# print the results
AggE_sex.a3
## Anova Table (Type III tests)
##
## Response: Agg Entropy
## Sum Sq Df F value Pr(>F)
## (Intercept) 47.286 1 160.9427 3.967e-13 ***
## Sex 0.445 2 0.7567 0.4786
## Residuals 8.226 28
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# All F-values and Sums of squares of the untransformed Agg Entropy vs Sex for three types of ANOVA, meaning the variables are balanced and factors are orthogonal, so can trust the default type 1.
# AggE vs Sex transformed (t1) vs sex; did this because the authors report doing boxcox-informed transformations, but gives a different F-value than what they reported in Table 1, so have commented them out.
# AggE_sex <- aov(data = df, AggE_t1 ~ Sex)
# AggE_sex.a <- Anova(AggE_sex, type = c(2))
# AggE_sex.a
# Average Entropy between different Sexes. Because of the above, I did not use the boxcox transformed Avg Entropy, and it works out that the F-value calculated below is the exact same as that reported in Table 1.
AvgE_sex <- aov(data = df, `Avg Entropy` ~ Sex) # untransformed, also pull out F-value
summary.aov(AvgE_sex) #printing the summary
## Df Sum Sq Mean Sq F value Pr(>F)
## Sex 2 0.024 0.01193 0.097 0.908
## Residuals 28 3.438 0.12277
# to show type 2 and 3 print the same F-value
# type 2 ANOVA
AvgE_sex.a2 <- Anova(AvgE_sex, type = "II")
# print the results
AvgE_sex.a2 # same as the original and same as what the authors reported.
## Anova Table (Type II tests)
##
## Response: Avg Entropy
## Sum Sq Df F value Pr(>F)
## Sex 0.0239 2 0.0972 0.9077
## Residuals 3.4375 28
AvgE_sex.a3 <- Anova(AvgE_sex, type = "III")
# print the results
AvgE_sex.a3 # same as the original and same as what the authors reported.
## Anova Table (Type III tests)
##
## Response: Avg Entropy
## Sum Sq Df F value Pr(>F)
## (Intercept) 36.485 1 297.1863 <2e-16 ***
## Sex 0.024 2 0.0972 0.9077
## Residuals 3.438 28
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# All F-values and Sums of squares of the untransformed Avg Entropy vs Sex for three types of ANOVA, meaning the variables are balanced and factors are orthogonal.
# pulling out the F-statistic (nonparametric) from the type 1 ANOVA model
AvgE_sex.F <- AvgE_sex %>%
generics::tidy() %>% # a nice table
filter(term == "Sex") # since comparing between sexes
AvgE_sex.F$statistic #AvgE F-value. matches what's reported in Table 1
## [1] 0.09718057
# Call Rate (calls/min) between the Sexes.
CRR_sex <- aov(data = df, CRR ~ Sex)
summary.aov(CRR_sex) #printing the model summary
## Df Sum Sq Mean Sq F value Pr(>F)
## Sex 2 1.032 0.5160 1.819 0.181
## Residuals 28 7.943 0.2837
# pulling out the F-statistic (nonparametric)
CRR_sex.F <- CRR_sex %>% # pull out the F-value
generics::tidy() %>% # a nice table
filter(term == "Sex") # since comparing between sexes
CRR_sex.F$statistic #Call Rate F-value, matches the F-value reported in Table 1
## [1] 1.818948
# Sound Pressure Level (SPL) between the Sexes.
SPL_sex <- aov(data = df, SPL ~ Sex)
summary.aov(SPL_sex) #printing the summary
## Df Sum Sq Mean Sq F value Pr(>F)
## Sex 2 45.2 22.59 1.374 0.274
## Residuals 22 361.7 16.44
## 6 observations deleted due to missingness
# pulling out the F-statistic (nonparametric)
SPL_sex.F <- SPL_sex %>% # pull out the F-value
generics::tidy() %>% # a nice table
filter(term == "Sex") # since comparing between sexes
SPL_sex.F$statistic # SPL F-value
## [1] 1.373854
# dataframe of all the omnibus F-values from the above ANOVAs plus two empty columns so it will align with the Acoustics table below
F.stat <- data.frame(sex = " ", #will correspond with sex
n = " ", # will correspond with sex pop size
svl.F = SVL_sex.F$statistic, # snout-vent length
mass.F = mass_sex.F$statistic, # mass
nd.F = ND_sex.F$statistic, # note duration
cd.F = CD_sex.F$statistic, # call duration
df.F = DF_sex.F$statistic, # dominant frequency
aggE.F = AggE_sex.F$statistic, # aggregated entropy
avgE.F = AvgE_sex.F$statistic, # avg entropy
crr.F = CRR_sex.F$statistic, # call rep rate
spl.F = SPL_sex.F$statistic) # sound pressure level
F.stat
## sex n svl.F mass.F nd.F cd.F df.F aggE.F avgE.F
## 1 6.582981 25.43115 4.969906 8.122413 3.997933 0.7566687 0.09718057
## crr.F spl.F
## 1 1.818948 1.373854
detach(package:MASS)
Here, I replicate the Mean and Standard Deviations reported for each acoustic variable for each sex in Table 1. I do this by
#for making nice tables and doing data table manipulations
require(data.table)
# making a table of the calculated mean and standard deviation for each acoustic variable reported in Table 1
bySex <- group_by(df, Sex)
Acoustics <- bySex %>%
summarise(n_cases = n(), # sample size in group
mean_SVL = mean(LHC, na.rm = TRUE), # mean SVL
sd_SVL = sd(LHC, na.rm = TRUE), # standard deviation SVL
mean_mass = mean(Peso, na.rm = TRUE), # mean mass (g)
sd_mass = sd(Peso, na.rm = TRUE), # sd mass
mean_ND = mean(ND, na.rm = TRUE), # mean note duration (s)
sd_ND = sd(ND, na.rm = TRUE), # sd note duration
mean_CD = mean(CD, na.rm = TRUE), # mean call duration (s)
sd_CD = sd(CD, na.rm = TRUE),
mean_DF = mean(DF, na.rm = TRUE), # mean dominant freq (Hz)
sd_DF = sd(DF, na.rm = TRUE),
mean_AggE = mean(`Agg Entropy`, na.rm = TRUE), # mean Aggregated Entropy
sd_AggE = sd(`Agg Entropy`, na.rm = TRUE),
mean_AvgE = mean(`Avg Entropy`, na.rm = TRUE), # mean Average Entropy
sd_AvgE = sd(`Avg Entropy`, na.rm = TRUE),
mean_CRR = mean(CRR, na.rm = TRUE), # mean Call/min
sd_CRR = sd(CRR, na.rm = TRUE),
mean_SPL = mean(SPL, na.rm = TRUE), # mean sound pressure level (dB)
sd_SPL = sd(SPL, na.rm = TRUE)
)
# Reorganizing the summarized dataframe created above to combine the mean and sd into the same column.
Acoustics <- Acoustics %>% # changing labels back to their original state for the table here.
mutate(Sex = case_when(Sex == "F" ~ "Females",
Sex == "NPM" ~ "Non-pregnant males",
Sex == "PM" ~ "Pregnant males")) %>%
unite(mean_SVL, sd_SVL, col = "snout_vent_length", sep = " ± ") %>%
unite(mean_mass, sd_mass, col = "mass", sep = " ± ") %>%
unite(mean_ND, sd_ND, col = "note_duration", sep =" ± ") %>%
unite(mean_CD, sd_CD, col = "call_duration", sep = " ± ") %>%
unite(mean_DF, sd_DF, col = "dominant_frequencey", sep =" ± ") %>%
unite(mean_AggE, sd_AggE, col = "aggregate_entropy", sep = " ± ") %>%
unite(mean_AvgE, sd_AvgE, col = "average_entropy", sep = " ± ") %>%
unite(mean_CRR, sd_CRR, col = "call_repitition_rate", sep = " ± ") %>%
unite(mean_SPL, sd_SPL, col = "sound_pressure_level", sep = " ± ") %>%
transpose(keep.names = "Acoustic Variable") # flipping the orientation of the table so Sex will be at the top
# adding the F.stat table to the Acoustic table
F.stat <- F.stat %>% transpose(keep.names = "Acoustic_Variable")# flipping orientation
F.stat # printing to ensure it worked
## Acoustic_Variable V1
## 1 sex
## 2 n
## 3 svl.F 6.58298100665384
## 4 mass.F 25.4311542105701
## 5 nd.F 4.96990639070826
## 6 cd.F 8.12241272701847
## 7 df.F 3.99793257425526
## 8 aggE.F 0.756668690621354
## 9 avgE.F 0.0971805730424986
## 10 crr.F 1.81894792111049
## 11 spl.F 1.37385423279138
Acoustics <- mutate(Acoustics, F_value = F.stat$V1) %>% # adding the F-values to the Acoustics table
rename("F" = V1, "NPM" = V2, "PM" = V3) # renaming the Acoustics data table columns
Acoustics %>% # printing the final data table, using kable to make it look nice :)
kable(align = 'l', booktabs = TRUE) %>% # left aligned data
kable_styling(font_size = 11, full_width = TRUE) # font-size
| Acoustic Variable | F | NPM | PM | F_value |
|---|---|---|---|---|
| Sex | Females | Non-pregnant males | Pregnant males | |
| n_cases | 9 | 11 | 11 | |
| snout_vent_length | 23.2222222222222 ± 0.974394398816231 | 21.99 ± 0.768042244208538 | 22.1727272727273 ± 0.643569590783948 | 6.58298100665384 |
| mass | 1.06222222222222 ± 0.101707642015949 | 0.79 ± 0.0673300329224139 | 0.964545454545455 ± 0.0839480358750146 | 25.4311542105701 |
| note_duration | 0.191305666333333 ± 0.0325399835957568 | 0.152636938090909 ± 0.0256564327093529 | 0.165423762818182 ± 0.0249443852582658 | 4.96990639070826 |
| call_duration | 1.87322247025 ± 0.201837134488656 | 1.5454524386 ± 0.165680686897357 | 1.6341607955 ± 0.161488820849156 | 8.12241272701847 |
| dominant_frequencey | 3390.71382922222 ± 245.998679493103 | 3711.98892163636 ± 228.989746357455 | 3651.45401072727 ± 309.224874457775 | 3.99793257425526 |
| aggregate_entropy | 2.292149032 ± 0.536556919319484 | 2.01411166836364 ± 0.469465368911618 | 2.04578614972727 ± 0.609866842967914 | 0.756668690621354 |
| average_entropy | 2.01343139866667 ± 0.280593615650822 | 1.95511875736364 ± 0.397587255429383 | 1.94988657527273 ± 0.350271792331423 | 0.0971805730424986 |
| call_repitition_rate | 1.71111111111111 ± 0.4371625682868 | 1.45454545454545 ± 0.522232967867094 | 1.25454545454545 ± 0.607229176445988 | 1.81894792111049 |
| sound_pressure_level | 64.38023576625 ± 3.7557077162866 | 61.5763027144444 ± 4.05125858350565 | 64.37328545625 ± 4.33670962047346 | 1.37385423279138 |
detach(package:data.table)
Inferential statistical analysis
Whiskers represent the 90th and 10th percentiles.
require(cowplot)
# A Boxplot of Note Duration
fig3_A <- ggplot(data = df, aes(x = Sex, y = ND)) + # made the object
geom_boxplot(na.rm = TRUE) + # boxplot
labs(x = "Sex", y = "Note Duration (s)", title = "A") + # axis labels and title with figure label
scale_y_continuous(breaks = c(0.15, 0.20, 0.25, 0.30), # so axis matches that in the article
position = 'left') +
theme_classic()
# B Boxplot of Call Duration
fig3_B <- ggplot(data = df, aes(x = Sex, y = CD)) +
geom_boxplot(na.rm = TRUE) +
labs(x = "Sex", y = "Call Duration (s)", title = "B") + # scaling didn't work here. RIP
theme_classic()
# C Boxplot of Dominant Frequency
fig3_C <- ggplot(data = df, aes(x = Sex, y = DF)) + #define object
geom_boxplot(na.rm = TRUE) + # boxplot
labs(x = "Sex", y = "Dominant Frequency (Hz)", title = "C") + # axis labels and title with figure label, couldn't get the y-axis label to match here either.
theme_classic()
# organize the output of figures using cowplot
plot_grid(fig3_A, fig3_B, fig3_C, label_size = 8, nrow = 1)
detach(package:cowplot)
visualization filter data for 4 notes per call (NC == 4, n = 22). drop_na() using
MASS::lda() Linear Discriminant Analysis LD1 = f and npm, LD2 = f and pm
groups + x1 + x2
require(MASS)
df <- df %>%
drop_na(DF, ND, CRR, `Agg Entropy`, SPL)
Sex1 <- df %>%
dplyr::filter(Sex != "PM") # filtering for only females and nonpregnant males
Sex2 <- df %>%
dplyr::filter(Sex != "NPM") # filtering for only females and pregnant males
lda1 <- MASS::lda(Sex ~ DF + ND + CRR + `Agg Entropy` + SPL, data = Sex1)
LD1 <- lda1$scaling
# normalize : subtract mean, divide by SD,
detach(package:MASS)
Narrative section that overviews how successful were you at replicating the analyses and visualizations in the study.
1. What problems did you encounter? Some of the data described in the methods isn’t included in the data frame. Transformations (how, for what analyses, and which variables) not inclduded in the methods 2. Why might you have encountered those problems?
3. What details were lacking from the original study’s methods that might have hampered your ability to replicate the authors’ results?
• Sandmeier, F. (2016). Rhinoderma darwinii [Encyclopedia]. AmphibiaWeb; University of California, Berkeley, CA, USA. https://amphibiaweb.org/species/4322
• Serrano, J. M., Penna, M., Valenzuela-Sánchez, A., Mendez, M. A., & Azat, C. (2020). Monomorphic call structure and dimorphic vocal phenology in a sex-role reversed frog. Behavioral Ecology and Sociobiology, 74(10), 127. https://doi.org/10.1007/s00265-020-02903-3